Our presentation video link: https://www.dropbox.com/s/e04cl4d76pjm0rb/5206Presentation.mov?dl=0

Introduction: Why Fertility Rate?

Labor force has always been what economists stress the importance of due to the correlation between labor force and economic growth. This is demonstrated in the Solow-Swan model, which is known as a non-classical growth model (exogenous model). \[Y_t = K_t^\alpha(A_tL_t)^{1-\alpha}\] where \(t\) denotes time, \(0 < \alpha < 1\) is the elasticity of output with respect to capital, and \(Y_t\) represents total production. \(A\) refers to labor-augmenting technology or “knowledge”, thus \(AL\) represents effective labor.

Robert Solow and Trevor Swan (1956) tries to explain the economic growth by the capital, labor, and knowledge in the model, assuming that the technology level is exogenous and the same among countries. They conclude that the differences in long-run GDP growth rates per capita across countries represents the difference in capital accumulation, labor force and population.

Also, it is the fertility rate that represents the speed of labor force generation. Therefore, the accumulation of labor force is closely related to fertility rate. However, in recent decades, the fertility rate has declined in many countries. Some developed countries’ fertility rates have fallen below the replacement rates like the United States, Korea, Japan and so on.


Fertility Rate All Over the world

This means that such countries face a lot of problems brought by the low fertility rate like the aging population, the growth of the economy. Therefore, we are interested in what is related to the fertility rate and make a recommendation to alleviate the problem that brought by declining fertility. In addition, the price of houses has increased dramatically over the past few decades especially in many countries. It is easy to see the upward trend from the global real house price index.


Housing Price Index

Some people argue that it is because of the increasing house price that lower the fertility rate. However, the plot below suggests a positive relationship. We would like to find out whether high housing price decreases fertility rate from a country-level dataset.


Fertility Rate v.s. Housing Price Index

Data Description & Source

Variable Description Source
fer Fertility rate, total (births per woman) World Bank
gdp GDP World Bank
lab Labor force participation rate, female (% of female population ages 15+) World Bank
edu School enrollment, secondary, female (% gross) World Bank
cpi Consumer Price Index World Bank
unemp Unemployment Rate World Bank
pop Population World Bank
hou Housing Price Index OECD
ten Housing Tenure UN Data, American Housing Survey
sav Household Savings, % of household disposable income OECD Data
asset Household Financial Assets, US dollars/capita OECD Data

In order to analyze the potential causal effect between fertility rate and housing price, we try to handle the confounding factors of them. Here we combined two methods to achieve it. Firstly, fixed/random effect regression based on panel data can reduces the time-invariant confounding factors. Secondly, inspired by the study of Cevat Giray Aksoy (2016), we know that a main kind of confounders between fertility and housing price is personal/household wealth. So we try to involve some typical variables representing household wealth, like household financial assets and household savings. We would start with a ordinary linear regression.

Data Preparation

Import some packages and load data into our environment!

library(tidyverse)
library(readxl)
library(VIM)
library(imputeTS)
library(broom)
library(knitr)
library(olsrr)
library(MASS)
library(psych)
library(jtools)
library(boot)
library(plm)
fer <- read_csv("fertility.csv")
gdp <- read_excel("gdp.xls")
hou <- read_csv("house_price.csv")
lab <- read_csv("female_labor_force_participation.csv")
edu <- read_csv("school_enrollment_secondary.csv")
ten <- read_csv("house_tenure.csv")
cpi <- read_csv("cpi.csv")
unemp <- read_csv("unemployment.csv")
pop <- read_csv("population.csv")
asset <- read_csv("financial_asset.csv")
sav <- read_csv("household_saving.csv")

Do some data cleaning and combination!

fer <- fer %>%
  dplyr::select(-names(fer)[2:4]) %>%
  rename(country = `Country Name`)
fer <- fer %>%
  pivot_longer(names(fer)[-1], names_to = "year", values_to = "fer")
fer$year <- as.numeric(fer$year)

gdp <- gdp %>%
   dplyr::select(-names(gdp)[2:4]) %>%
  rename(country = `Country Name`)
gdp <- gdp %>%
  pivot_longer(names(gdp)[-1], names_to = "year", values_to = "gdp")
gdp$year <- as.numeric(gdp$year)

hou <- hou %>%
   dplyr::select(Country, Time, Value) %>%
  filter(str_length(Time) == 4) %>%
  rename(country = Country, year = Time, hou = Value) %>%
  group_by(country, year) %>%
  summarize(hou = mean(hou))
hou$year <- as.numeric(hou$year)

lab <- lab %>%
   dplyr::select(-names(lab)[2:4]) %>%
  rename(country = `Country Name`)
lab <- lab %>%
  pivot_longer(names(lab)[-1], names_to = "year", values_to = "lab")
lab$year <- as.numeric(lab$year)

edu <- edu %>%
   dplyr::select(-names(edu)[2:4]) %>%
  rename(country = `Country Name`)
edu <- edu %>%
  pivot_longer(names(edu)[-1], names_to = "year", values_to = "edu")
edu$year <- as.numeric(edu$year)

ten <- ten %>%
  filter(Area == "Total" & `Type of housing unit` == "Total") %>%
  rename(country = `Country or Area`, year = Year, tenure = Tenure, value = Value) %>%
   dplyr::select(country, year, tenure, value) %>%
  pivot_wider(names_from = tenure, values_from = value) %>%
  mutate(ten = `Member of household owns the housing unit` / Total) %>%
   dplyr::select(country, year, ten) %>%
  drop_na()

# add some data manually, which comes from new and small dataset/websites
ten <- ten %>% 
  add_row(country = "United States", year = 2019, ten = 79475/124135) %>%
  add_row(country = "United States", year = 2017, ten = 77567/121560) %>%
  add_row(country = "United States", year = 2015, ten = 74299/118290) %>%
  add_row(country = "United States", year = 2013, ten = 75650/115852) %>%
  add_row(country = "United States", year = 2011, ten = 76053/114833)
ten <- ten %>%
  group_by(country) %>%
  summarize(ten = mean(ten))

cpi <- cpi %>%
   dplyr::select(-names(cpi)[2:4]) %>%
  rename(country = `Country Name`)
cpi <- cpi %>%
  pivot_longer(names(cpi)[-1], names_to = "year", values_to = "cpi")
cpi$year <- as.numeric(cpi$year)

unemp <- unemp %>%
   dplyr::select(-names(unemp)[2:4]) %>%
  rename(country = `Country Name`)
unemp <- unemp %>%
  pivot_longer(names(unemp)[-1], names_to = "year", values_to = "unemp")
unemp$year <- as.numeric(unemp$year)

pop <- pop %>%
   dplyr::select(-names(pop)[2:4]) %>%
  rename(country = `Country Name`)
pop <- pop %>%
  pivot_longer(names(pop)[-1], names_to = "year", values_to = "pop")
pop$year <- as.numeric(pop$year)

fer_temp <- read_csv("fertility.csv")
fer_temp <- fer_temp %>%
  dplyr::select(`Country Name`, `Country Code`)
asset <- asset %>%
  filter(SUBJECT == "TOT") %>%
  rename(`Country Code` = LOCATION) %>%
  left_join(fer_temp) %>%
  rename(country = `Country Name`, year = TIME, asset = Value) %>%
  dplyr::select(country, year, asset)
sav <- sav %>%
  rename(`Country Code` = LOCATION) %>%
  left_join(fer_temp) %>%
  rename(country = `Country Name`, year = TIME, sav = Value) %>%
  dplyr::select(country, year, sav)
  
data_joined <- fer %>%
  inner_join(gdp) %>%
  inner_join(hou) %>%
  inner_join(lab) %>%
  inner_join(edu) %>%
  inner_join(cpi) %>%
  inner_join(unemp) %>%
  inner_join(pop) %>%
  inner_join(asset) %>%
  inner_join(sav) %>%
  left_join(ten) %>%
  drop_na()
data <- data_joined %>%
  mutate(gdp_per_log = log(gdp / pop), hou = hou/100, lab = lab/100, edu = edu/100, unemp = unemp/100, pop = log(pop), cpi = cpi/100, sav = sav / 100, asset = log(asset)) %>%
  dplyr::select(-gdp)
data_for_linear <- data %>%
  dplyr::select(-country, -year)

Before setting up our model, we rescaled some variables. We generate the GDP per capita using GDP/population because GDP per capita is one of the comparative indicators of economic performance, which can help us compare the individual’s living standard in different countries. Then we take the log of the GDP per capita to make the range of GDP per capita smaller. The population is also logged. Also, to make all the predictors comparable to each other, we divide all the predictors by 100, since all the predictors previously all multiplied by 100 to remove the percentage sign.

A First Glance: Scatter Plot and Full Model

pairs.panels(data_for_linear)

From the scatter plot, we’d expected that housing price has a negative relationship with fertility rate.

Let’s look at the full linear regression model first.

full <- lm(fer ~ ., data = data_for_linear)
summary(full)
## 
## Call:
## lm(formula = fer ~ ., data = data_for_linear)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52421 -0.19222 -0.01779  0.15653  0.75660 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.25283    0.30196   0.837 0.402854    
## hou          0.02194    0.06043   0.363 0.716705    
## lab          0.96154    0.19617   4.901 1.33e-06 ***
## edu          0.26407    0.06852   3.854 0.000133 ***
## cpi         -0.82767    0.56988  -1.452 0.147091    
## unemp       -2.50697    0.32344  -7.751 6.04e-14 ***
## pop          0.06749    0.00889   7.591 1.82e-13 ***
## asset       -0.14102    0.03304  -4.268 2.41e-05 ***
## sav          0.22818    0.23152   0.986 0.324856    
## ten          0.13802    0.11772   1.172 0.241621    
## gdp_per_log  0.10899    0.03579   3.045 0.002463 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2373 on 454 degrees of freedom
## Multiple R-squared:  0.3131, Adjusted R-squared:  0.298 
## F-statistic: 20.69 on 10 and 454 DF,  p-value: < 2.2e-16

Then let’s do some simple assumptions test for the ordinary linear regresison model.

Diagnostics

Assumptions of Linear Regressions

What can go wrong?

Our regression model require some assumptions:

Residuals should:

  • be normally distributed.

  • be independent.

  • have the same variance.

  • Basic idea of diagnostic measures: if model is correct then residuals \(e_i = Y_i - \hat{Y_i}, 1\leq i \leq n\) should look like a sample of (not quite independent) \(N(0,\sigma^2)\) random variables.

Therefore, we are going to check all the assumptions.

par(mfrow = c(2,2))
plot(full, pch = 23, bg = 'orange', cex = 1)

From the plots above, the error follows a normal distribution and have constant variance. From Cook’s distance, there are obviously some outliers.

Since this a ordinary multiple regression without considering fixed/random effect, it may have many problems.

Fixed/Random Effect Regression based on Panel Data

Fixed/random effect regression model can assist in controlling for omitted variable bias due to unobserved heterogeneity when this heterogeneity is constant over time, which means it can help us reduce the time-invariant confounding factors.

fixed = plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data, index = c("country", "year"), model = "within")
summary(fixed)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = fer ~ hou + lab + edu + cpi + unemp + pop + asset + 
##     sav + ten + gdp_per_log, data = data, model = "within", index = c("country", 
##     "year"))
## 
## Unbalanced Panel: n = 24, T = 4-25, N = 465
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -0.29111671 -0.04162546 -0.00015286  0.04563224  0.19298887 
## 
## Coefficients:
##              Estimate Std. Error  t-value  Pr(>|t|)    
## hou          0.191196   0.034262   5.5803 4.245e-08 ***
## lab          1.303811   0.207217   6.2920 7.697e-10 ***
## edu         -0.235057   0.059760  -3.9333 9.758e-05 ***
## cpi         -0.328527   0.208356  -1.5768  0.115584    
## unemp       -0.050466   0.165056  -0.3057  0.759943    
## pop         -1.289671   0.080468 -16.0272 < 2.2e-16 ***
## asset       -0.042504   0.022756  -1.8679  0.062457 .  
## sav         -0.226603   0.108043  -2.0973  0.036544 *  
## gdp_per_log  0.085504   0.025688   3.3285  0.000948 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.3898
## Residual Sum of Squares: 2.6349
## R-Squared:      0.51113
## Adj. R-Squared: 0.47492
## F-statistic: 50.1859 on 9 and 432 DF, p-value: < 2.22e-16
random = plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data, index = c("country", "year"), model = "random")
summary(random)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = fer ~ hou + lab + edu + cpi + unemp + pop + asset + 
##     sav + ten + gdp_per_log, data = data, model = "random", index = c("country", 
##     "year"))
## 
## Unbalanced Panel: n = 24, T = 4-25, N = 465
## 
## Effects:
##                    var  std.dev share
## idiosyncratic 0.006099 0.078098 0.076
## individual    0.074098 0.272210 0.924
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8580  0.9343  0.9403  0.9360  0.9415  0.9427 
## 
## Residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.32294 -0.05783 -0.00055 -0.00105  0.06544  0.27091 
## 
## Coefficients:
##              Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept)  4.994666   0.896025  5.5742 2.486e-08 ***
## hou          0.123205   0.041081  2.9991 0.0027080 ** 
## lab          0.331086   0.234509  1.4118 0.1580002    
## edu         -0.227214   0.072371 -3.1396 0.0016920 ** 
## cpi         -0.890455   0.251054 -3.5469 0.0003898 ***
## unemp       -0.541037   0.197443 -2.7402 0.0061399 ** 
## pop         -0.182482   0.041803 -4.3653 1.270e-05 ***
## asset       -0.116410   0.027057 -4.3024 1.690e-05 ***
## sav         -0.033421   0.131063 -0.2550 0.7987247    
## ten         -0.923720   0.652696 -1.4152 0.1569987    
## gdp_per_log  0.152177   0.030650  4.9650 6.871e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.7679
## Residual Sum of Squares: 4.166
## R-Squared:      0.27813
## Adj. R-Squared: 0.26223
## Chisq: 148.777 on 10 DF, p-value: < 2.22e-16

Which is better, Fixed effect or random effect regression? Let’s do a Hausman Test (set significance level as 0.05)

phtest(fixed, random)
## 
##  Hausman Test
## 
## data:  fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten +  ...
## chisq = 258.08, df = 9, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent

Since p-value < 0.05, we choose fixed effect model.

Parameter Interpretation

The coefficient of housing price is 0.19, which means, one unit increase in housing index would increase the fertility by 0.19. It is strongly against the opinion that a high housing price would decrease the fertility rate. In fact, according to some research, houses, as a kind of financial asset, has an investmental value. An increasing housing price can increase some persons’ wealth who already own houses, therefore strengthen their willing to have more children. Even for those who doesn’t own houses, expectation of continuous increasing of housing price may have the same effect.

Also, problems arises. If a country’s housing price is already high enough, does the increasing in housing price still increases fertility rate? We may talk it through in the following parameter uncertainty part.

Parameter Uncertainty

Parameters of High and Low House Pricing Groups

We divided the data into two groups by the median of house price to see how the coefficients change within different groups.

data_low_hou <- data %>%
  filter(hou <= median(data$hou))
data_high_hou <- data %>%
  filter(hou > median(data$hou))
model_low_hou <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_low_hou, index = c("country", "year"), model = "within")
model_high_hou <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_high_hou, index = c("country", "year"), model = "within")
export_summs(fixed, model_low_hou, model_high_hou, model.names = c("Full Data", "Low Housing Price Group", "High Housing Price Group"))
Full DataLow Housing Price GroupHigh Housing Price Group
hou0.19 ***0.23 ***0.01    
(0.03)   (0.06)   (0.07)   
lab1.30 ***0.02    3.09 ***
(0.21)   (0.26)   (0.40)   
edu-0.24 ***-0.15    -0.14    
(0.06)   (0.09)   (0.07)   
cpi-0.33    0.52 ** -0.50    
(0.21)   (0.20)   (0.32)   
unemp-0.05    -0.16    -0.68 ** 
(0.17)   (0.18)   (0.26)   
pop-1.29 ***-1.15 ***-1.36 ***
(0.08)   (0.11)   (0.17)   
asset-0.04    -0.02    -0.12 ***
(0.02)   (0.03)   (0.03)   
sav-0.23 *  -0.14    0.38 *  
(0.11)   (0.11)   (0.16)   
gdp_per_log0.09 ***0.09 ***0.03    
(0.03)   (0.02)   (0.04)   
nobs465       233       232       
r.squared0.51    0.56    0.53    
adj.r.squared0.47    0.49    0.45    
statistic50.19    28.34    24.77    
p.value0.00    0.00    0.00    
deviance2.63    0.49    0.88    
df.residual432.00    201.00    199.00    
nobs.1465.00    233.00    232.00    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Visualization of Coefficients

plot_summs(fixed, model_low_hou, model_high_hou, scale = TRUE, robust = TRUE, inner_ci_level = 0.9, model.names = c("Full Data", "Low Housing Price Group", "High Housing Price Group"), coefs = "hou")

From the table and plot, we can see that in the low housing price group, the impact of housing price is much bigger while it is close to 0 in high housing price group. It is shown that when housing price is already high, it wouldn’t stimulate people to have more children. The incentives almost disappear.

Parameters of More and Less Financial Asset Groups

We divided the data into two groups by the median of household financial assets.

data_poor <- data %>%
  filter(asset <= median(data$asset))
data_rich <- data %>%
  filter(asset > median(data$asset))
model_poor <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_poor, index = c("country", "year"), model = "within")
model_rich <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_rich, index = c("country", "year"), model = "within")
export_summs(fixed, model_poor, model_rich, model.names = c("Full Data", "Less Asset Group", "More Asset Group"))
Full DataLess Asset GroupMore Asset Group
hou0.19 ***0.11 *  0.19 ***
(0.03)   (0.05)   (0.06)   
lab1.30 ***-0.07    1.15 ***
(0.21)   (0.41)   (0.31)   
edu-0.24 ***-0.14    -0.14    
(0.06)   (0.10)   (0.09)   
cpi-0.33    -0.10    -0.44    
(0.21)   (0.23)   (0.48)   
unemp-0.05    -0.43 *  0.77 *  
(0.17)   (0.22)   (0.30)   
pop-1.29 ***-1.46 ***-2.00 ***
(0.08)   (0.15)   (0.26)   
asset-0.04    0.03    0.01    
(0.02)   (0.03)   (0.05)   
sav-0.23 *  -0.27 *  -0.25    
(0.11)   (0.13)   (0.27)   
gdp_per_log0.09 ***0.06    0.20 ***
(0.03)   (0.03)   (0.04)   
nobs465       233       232       
r.squared0.51    0.60    0.46    
adj.r.squared0.47    0.55    0.40    
statistic50.19    34.49    19.88    
p.value0.00    0.00    0.00    
deviance2.63    1.14    1.03    
df.residual432.00    203.00    208.00    
nobs.1465.00    233.00    232.00    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Visualization of Coefficients

plot_summs(fixed, model_poor, model_rich, scale = TRUE, robust = TRUE, inner_ci_level = 0.9, model.names = c("Full Data", "Less Asset Group", "More Asset Group"), coefs = "hou")

From these two groups’ results, we find those who already owns financial assets may be more stimulated. It is consistent with the guess above.

Limitations & Future Work

Conclusion & Recommendation

All in all, as we demonstrated in our fixed effect regression model (even the ordinary linear regression model), we can conclude that even though the house price indeed has an impact on the fertility rate, high house prices do not lower the fertility rate as many people believe. Therefore, we recommend that policymakers should not try to lower the house price in order to stimulate people to have more children. Even though there are still unsolved limitations in this study like we illustrated previously, which needs future research, this study is to inform policy makers about the relationship between house prices and the fertility rate, so that the policy makers have a better understanding of this and make right decisions.